Storing and Querying Historical Texts in a Relational Database

نویسندگان

  • Lukas C. Faulstich
  • Ulf Leser
  • Anke Lüdeling
چکیده

This paper describes an approach for storing and querying a large corpus of linguistically annotated historical texts in a relational database management system. Texts in such a corpus have a complex structure consisting of multiple text layers that are richly annotated and aligned to each other. Modeling and managing such corpora poses various challenges not present in simpler text collections. In particular, it is a difficult task to design and efficiently implement a query language for such complex annotation structures that fulfills the requirements of linguists and philologists. In this report, we describe steps towards a solution of this task. We describe a model for storing arbitrarily complex linguistic annotation schemes for text. The text itself may be present in various transliterations, transcriptions, or editions. We identify the main requirements for a query language on linguistic annotations in this scenario. From these requirements, we derive fundamental query operators and sketch their implementation in our model. Furthermore, we discuss initial ideas for improving the efficiency of an implementation based on relational databases and XML techniques.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Implementing a Linguistic Query Language for Historic Texts

We describe the design and implementation of the linguistic query language DDDquery. This language aims at querying a large linguistic database storing a corpus of richly annotated historic German texts. We use a graph-based data model that supports multiple independent annotation layers on a shared text layer as well as alignments of text layers representing the same text or related texts (e.g...

متن کامل

BtSQL: nested bitemporal relational database query language

A nested bitemporal relational data model and its query language are implemented. The bitemporal atom (BTA) is the fundamental construct to represent temporal data and it contains 5 components: a value, the lower and upper bounds of valid time, and the lower and upper bounds of the recoding time. We consider 2 types of data structures for storing BTAs: 1) string representation and 2) abstract d...

متن کامل

Storing and Querying XML Documents Without Using Schema Information

As the popularity of eXtensible Markup Language (XML) continues to increase at an astonishing pace, data management systems for storing and querying large repositories of XML data are urgently needed. In this paper, we investigate using a Relational Database Management System (RDBMS) for storing and querying XML data. We present a mapping scheme, called PAID, for mapping XML documents to relati...

متن کامل

Data Centric Integrated Framework on Hotel Industry Bridging XML to Relational Database

eXtensible Markup Language (XML) is a promising Internet standard for data representation and data exchange due to its flexible structure to share common information and data in World Wide Web [1]. Hence, it is vital to have competent and effective way of storing and querying XML document. There are three main approaches to store XML data [2][3][4][5]. First, storing XML data in repositories de...

متن کامل

Storing and Querying Probabilistic XML Using a Probabilistic Relational DBMS

This work explores the feasibility of storing and querying probabilistic XML in a probabilistic relational database. Our approach is to adapt known techniques for mapping XML to relational data such that the possible worlds are preserved. We show that this approach can work for any XML-to-relational technique by adapting a representative schema-based (inlining) as well as a representative schem...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2005